Rapid proteomic analysis for solid tumors reveals LSD1 as a drug target in an end‐stage cancer patient

Doll et al. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6068348/

Abstract

Recent advances in mass spectrometry (MS)‐based technologies are now set to transform translational cancer proteomics from an idea to a practice. Here, we present a robust proteomic workflow for the analysis of clinically relevant human cancer tissues that allows quantitation of thousands of tumor proteins in several hours of measuring time and a total turnaround of a few days. We applied it to a chemorefractory metastatic case of the extremely rare urachal carcinoma. Quantitative comparison of lung metastases and surrounding tissue revealed several significantly upregulated proteins, among them lysine‐specific histone demethylase 1 (LSD1/KDM1A). LSD1 is an epigenetic regulator and the target of active development efforts in oncology. Thus, clinical cancer proteomics can rapidly and efficiently identify actionable therapeutic options. While currently described for a single case study, we envision that it can be applied broadly to other patients in a similar condition.

Workflow

Figure 2

Figure 2 Proteomics workflow for the case study. (A) Timeline of the project. (B) Experimental design, including source of material, inStageTip sample preparation, and depiction of the analytical workflow

Results

Figure 3

Figure 3 Proteins differentially expressed in the urachal carcinoma lung metastases. (A) Volcano plot of the p‐values (y‐axis) vs. the log2 protein abundance differences (x‐axis) between metastases and control, with lines of significance colored in black or gray lines corresponding to a 5% or 1% FDR, respectively. (B) Mechanisms of action of LSD1/KDM1A and inhibitory drug treatment proposed: JATROSOME. TRANYLCYPROMIN

Workflow with the Clinical Knowledge Graph

  1. Generate Analysis Report: Proteomics data

  2. Identify Candidate Drug Treatments

  3. Rank Candidates According to Toxicity

|Clinical\_Knowledge\_Graph|

Generate Analysis Report: Proteomics Data

Report Manager

[206]:
from ckg.report_manager import project

from plotly.offline import init_notebook_mode, iplot
%matplotlib inline
init_notebook_mode(connected=True)
[207]:
configuration_files = {"proteomics":"../assets/proteomics_CS.yml"}
[208]:
study_case_project = project.Project(identifier="P0000002", configuration_files=configuration_files, datasets={}, knowledge=None, report={})
[209]:
study_case_project.build_project(force=False)
[210]:
study_case_project.generate_report()
[211]:
study_case_project.show_report(environment='notebook')
[211]:
defaultdict(list,
            {'PROJECT INFORMATION': [],
             'PROTEOMICS': [],
             'MULTIOMICS': [],
             'KNOWLEDGE GRAPH': []})
[212]:
study_case_project.list_datasets()
[212]:
dict_keys(['multiomics', 'proteomics'])
[213]:
proteomics_dataset = study_case_project.get_dataset(dataset='proteomics')
[214]:
proteomics_dataset.list_dataframes()
[214]:
['go annotation',
 'number of modified proteins',
 'number of peptides',
 'number of proteins',
 'original',
 'pathway annotation',
 'processed',
 'protein biomarkers',
 'regulated',
 'regulation table',
 'tissue qcmarkers']

In this case, we use the regulation table to extract proteins upregulated in the metastatic tissue compare to non-cancerous tissue.

[215]:
regulation_table = proteomics_dataset.get_dataframe(dataset_name='regulation table')
[216]:
regulation_table.head()
[216]:
-log10 pvalue FC Method T-statistics correction dfk dfn effsize efftype group1 ... identifier log2FC mean(group1) mean(group2) padj pvalue rejected s0 std(group1) std(group2)
0 0.487413 -1.277214 SAMR Two class paired 0.153337 permutation FDR (4.0 perm) 1 2 -0.738 hedges CANCER ... A1BG~P04217 -0.353 31.710 32.063 0.714951 0.325527 False 2 0.047 0.384
1 1.438878 3.610003 SAMR Two class paired -0.924233 permutation FDR (4.0 perm) 1 2 2.913 hedges CANCER ... A1CF~Q9NQ94 1.852 26.563 24.711 0.052925 0.036402 False 2 0.360 0.366
2 3.001177 -1.926524 SAMR Two class paired 0.463202 permutation FDR (4.0 perm) 1 2 -18.081 hedges CANCER ... A2M~P01023 -0.946 34.287 35.233 0.271411 0.000997 False 2 0.033 0.026
3 1.192262 2.801113 SAMR Two class paired -0.594485 permutation FDR (4.0 perm) 1 2 2.145 hedges CANCER ... AAAS~Q9NRG9 1.486 26.317 24.831 0.149537 0.064230 False 2 0.173 0.532
4 0.321607 -1.786332 SAMR Two class paired 0.346611 permutation FDR (4.0 perm) 1 2 -0.496 hedges CANCER ... AACS~Q86V21 -0.837 26.350 27.187 0.423692 0.476862 False 2 0.625 1.211

5 rows × 21 columns

[217]:
regulation_table[regulation_table['identifier']=='HDAC1~Q13547']
[217]:
-log10 pvalue FC Method T-statistics correction dfk dfn effsize efftype group1 ... identifier log2FC mean(group1) mean(group2) padj pvalue rejected s0 std(group1) std(group2)
1885 0.782856 10.367528 SAMR Two class paired -1.008813 permutation FDR (4.0 perm) 1 2 1.227 hedges CANCER ... HDAC1~Q13547 3.374 30.666 27.292 0.036735 0.164871 True 2 0.301 2.201

1 rows × 21 columns

[218]:
up_regulated_proteins = regulation_table.loc[(regulation_table.rejected) & (regulation_table.FC > 2), ['identifier']]
[219]:
up_regulated_proteins.shape
[219]:
(199, 1)
[220]:
'HDAC1~Q13547' in up_regulated_proteins['identifier'].tolist()
[220]:
True

Graph Database Connector

[221]:
from ckg.graphdb_connector import query_utils, connector
[222]:
driver = connector.getGraphDatabaseConnectionConfiguration()
[223]:
queries = query_utils.read_knowledge_queries()

1) Filter for Regulated Proteins Associated to Lung Cancer:

[224]:
selected_queries = query_utils.find_queries_involving_nodes(queries=queries, nodes=["Protein", "Disease"], print_pretty=True)
[225]:
selected_queries.head()
[225]:
Name Description involved_nodes involved_rels query example
id
Disease associated diseases in at least two of the pro... get relationships to diseases from a list of p... Protein,Disease ASSOCIATED_WITH MATCH (project:Project)-[:STUDIES_DISEASE]-(d:...
association_disease_score specific disease Return the list of proteins associated to a sp... Protein,Disease ASSOCIATED_WITH MATCH (protein:Protein)-[r]-(disease:Disease) ... proteins = ['A1BG~P04217','A2M~P01023','ACACB~...
[226]:
disease_query = selected_queries.loc["association_disease_score", "query"]
proteins = ['"{}"'.format(p) for p in up_regulated_proteins["identifier"].tolist()]
diseases = ['DOID:1324']
diseases = ['"{}"'.format(d) for d in diseases]
disease_query = disease_query.format(",".join(proteins),",".join(diseases), 1)
[227]:
proteins_associated_lung_cancer = connector.getCursorData(driver=driver, query=disease_query, parameters={})
[228]:
proteins_associated_lung_cancer.head()
[228]:
node1 node2 source type weight
0 SLC44A1~Q8WWI5 lung cancer DISEASES ASSOCIATED_WITH 1.026
1 IRF6~O14896 lung cancer DISEASES ASSOCIATED_WITH 1.192
2 SPINT2~O43291 lung cancer DISEASES ASSOCIATED_WITH 1.216
3 KDM1A~O60341 lung cancer DISEASES ASSOCIATED_WITH 2.405
4 CEACAM5~P06731 lung cancer DISEASES ASSOCIATED_WITH 3.122
[229]:
proteins_associated_lung_cancer.shape
[229]:
(69, 5)

2) Identify Inhibitory Drugs for those Proteins

[230]:
selected_queries = query_utils.find_queries_involving_nodes(queries=queries, nodes=["Protein", "Drug"], print_pretty=True)
[231]:
selected_queries.head()
[231]:
Name Description involved_nodes involved_rels query example
id
Drug associated drugs in at least two of the protei... get relationships to drugs. Limit the result t... Protein,Drug ACTS_ON MATCH (protein:Protein)-[r:ACTS_ON]-(drug:Drug...
association_drug_intervention_proteins drug intervention- protein association Return associations between a list of proteins... Project,Protein,Clinical_variable,Drug HAD_INTERVENTION,ACTS_ON MATCH (project:Project)-[]-()-[:HAD_INTERVENTI... project_id = 'P0000002'\nproteins = ['A1BG~P04...
association_drug_interaction_score drug interaction association Return the list of drugs associated to the lis... Protein,Drug ACTS_ON MATCH (protein:Protein)-[r]-(drug:Drug) WHERE ... proteins = ['A1BG~P04217','A2M~P01023','ACACB~...
[232]:
proteins = ['"{}"'.format(p) for p in proteins_associated_lung_cancer['node1'].tolist()]
drug_query = selected_queries.loc["association_drug_interaction_score", "query"].format(",".join(proteins), 'inhibition', 0.8)
[233]:
drugs_proposed = connector.getCursorData(driver=driver, query=drug_query, parameters={})
[234]:
drugs_proposed.head()
[234]:
Drug_desc action drug_id node1 node2 source type weight
0 None inhibition DB04808 ANG~P03950 Neamine STITCH ACTS_ON 0.800
1 None inhibition DB02198 ANG~P03950 2-Bromoacetyl Group STITCH ACTS_ON 0.800
2 Losartan is an angiotensin II receptor blocker... inhibition DB00678 ANG~P03950 Losartan STITCH ACTS_ON 0.957
3 Tamoxifen is a non-steroidal antiestrogen used... inhibition DB00675 ANG~P03950 Tamoxifen STITCH ACTS_ON 0.990
4 A macrolide compound obtained from Streptomyce... inhibition DB00877 ANG~P03950 Sirolimus STITCH ACTS_ON 0.800
[235]:
drugs_proposed.shape
[235]:
(67, 8)

We can already see that CKG found the same inhibitory drug that was identified in the study case published. However, many other options are proposed and could be further ranked using other criteria.

[236]:
from ckg.analytics_core import utils
from ckg.analytics_core.viz import viz
[237]:
net = viz.get_network(data=drugs_proposed, identifier="inhibition_drugs", args={"source":"node1", "target":"node2", "values":"weight", "node_size":"degree","title":"Proposed drugs", "color_weight":False})
[238]:
viz.visualize_notebook_network(net["notebook"], notebook_type='jupyter', layout={'width':'100%', 'height':'700px'})
[239]:
utils.json_network_to_gml(net['net_json'], path='drug_network.gml')
[240]:
proteins = ['"{}"'.format(p) for p in proteins_associated_lung_cancer['node1'].tolist()]
drug_query = queries["association_drug_interaction_score"]["query"].format(",".join(proteins), 'inhibition', 0.9)
[241]:
drugs_proposed = connector.getCursorData(driver=driver, query=drug_query, parameters={})
[242]:
drugs_proposed.head()
[242]:
Drug_desc action drug_id node1 node2 source type weight
0 Losartan is an angiotensin II receptor blocker... inhibition DB00678 ANG~P03950 Losartan STITCH ACTS_ON 0.957
1 Tamoxifen is a non-steroidal antiestrogen used... inhibition DB00675 ANG~P03950 Tamoxifen STITCH ACTS_ON 0.990
2 Paclitaxel is a chemotherapeutic agent markete... inhibition DB01229 CDH1~P12830 Paclitaxel STITCH ACTS_ON 0.957
3 A major primary bile acid produced in the live... inhibition DB02659 CDH1~P12830 Cholic Acid STITCH ACTS_ON 0.957
4 Calcitriol is an active metabolite of vitamin ... inhibition DB00136 CDH17~Q12864 Calcitriol STITCH ACTS_ON 0.957
[243]:
drugs_proposed.shape
[243]:
(15, 8)
[244]:
net = viz.get_network(data=drugs_proposed, identifier="inhibition_drugs", args={"source":"node1", "target":"node2", "values":"weight", "node_size":"degree","title":"Proposed drugs", "color_weight":False})
[245]:
viz.visualize_notebook_network(net["notebook"], notebook_type='jupyter', layout={'width':'100%', 'height':'700px'})
[246]:
utils.json_network_to_gml(net['net_json'], path='drug_network_reduced.gml')

3) Identify Proposed Drug’s Known Side Effects

[247]:
selected_queries = query_utils.find_queries_involving_nodes(queries=queries, nodes=["Phenotype", "Drug"], print_pretty=True)
[248]:
selected_queries.head()
[248]:
Name Description involved_nodes involved_rels query example
id
association_drug_sideeffects drug side effect association Return the list of side effects linked to drugs Phenotype,Drug ASSOCIATED_WITH MATCH (sideeffect:Phenotype)-[r]-(drug:Drug) W... drugs = ['DB00439', 'DB06196']\ndrug_side_effe...
[249]:
drugs = drugs_proposed["drug_id"].unique()
drugs = ['"{}"'.format(d) for d in drugs]
sideeffects_query = selected_queries.loc["association_drug_sideeffects", "query"].format(",".join(drugs))
[250]:
side_effects = connector.getCursorData(driver=driver, query=sideeffects_query, parameters={})
[251]:
side_effects.head()
[251]:
node1 node2 source type
0 Losartan Abnormality of fluid regulation SIDER HAS_SIDE_EFFECT
1 Losartan Thrombocytopenia SIDER HAS_SIDE_EFFECT
2 Losartan Nausea SIDER HAS_SIDE_EFFECT
3 Losartan Palpitations SIDER HAS_SIDE_EFFECT
4 Losartan Arthritis SIDER HAS_SIDE_EFFECT
[252]:
side_effects.groupby('node1')['node2'].count()
[252]:
node1
Atorvastatin    152
Bleomycin        51
Calcitriol       59
Cholic Acid       7
Gemcitabine      94
Glyburide        63
Losartan        120
Paclitaxel      218
Tamoxifen       106
Tolbutamide      18
Vildagliptin     22
Vorinostat       42
Name: node2, dtype: int64
[253]:
net = viz.get_network(data=side_effects, identifier="side_effects", args={"source":"node1", "target":"node2", "node_size":"degree","title":"Proposed drugs", "color_weight":False})
[254]:
side_effects.head()
[254]:
node1 node2 source type width
0 Losartan Abnormality of fluid regulation SIDER HAS_SIDE_EFFECT 1
1 Losartan Thrombocytopenia SIDER HAS_SIDE_EFFECT 1
2 Losartan Nausea SIDER HAS_SIDE_EFFECT 1
3 Losartan Palpitations SIDER HAS_SIDE_EFFECT 1
4 Losartan Arthritis SIDER HAS_SIDE_EFFECT 1
[255]:
utils.json_network_to_gml(net['net_json'], path='side_effects.gml')
[256]:
no_registerd_side_effects = list(set(drugs_proposed['node2'].tolist()).difference(side_effects['node1'].tolist()))
[257]:
no_registerd_side_effects
[257]:
['Trichostatin A', 'Resveratrol', 'dATP']

4) Reduce Adverse Response

[258]:
treatment_regimens = pd.DataFrame(['Oxaliplatin',
                     'Capecitabine',
                     'Folinic acid',
                     'Fluorouracil'], columns=['treatment'])
[259]:
selected_queries = query_utils.find_queries_involving_nodes(queries=queries, nodes=["Clinical_variable", "Drug"], print_pretty=True)
[260]:
selected_queries.head()
[260]:
Name Description involved_nodes involved_rels query example
id
association_drug_intervention_proteins drug intervention- protein association Return associations between a list of proteins... Project,Protein,Clinical_variable,Drug HAD_INTERVENTION,ACTS_ON MATCH (project:Project)-[]-()-[:HAD_INTERVENTI... project_id = 'P0000002'\nproteins = ['A1BG~P04...
side_effects_jaccard_similarity_intervention_proposed_drugs similarity between side effects Return the jaccard similarity between drugs us... Drug,Clinical_variable HAS_SIDE_EFFECT MATCH (d1:Drug)-[:HAS_SIDE_EFFECT]->(phenotype... intervention = ['Capecitabine', 'Fluorouracil'...
[261]:
drugs_proposed
[261]:
Drug_desc action drug_id node1 node2 source type weight
0 Losartan is an angiotensin II receptor blocker... inhibition DB00678 ANG~P03950 Losartan STITCH ACTS_ON 0.957
1 Tamoxifen is a non-steroidal antiestrogen used... inhibition DB00675 ANG~P03950 Tamoxifen STITCH ACTS_ON 0.990
2 Paclitaxel is a chemotherapeutic agent markete... inhibition DB01229 CDH1~P12830 Paclitaxel STITCH ACTS_ON 0.957
3 A major primary bile acid produced in the live... inhibition DB02659 CDH1~P12830 Cholic Acid STITCH ACTS_ON 0.957
4 Calcitriol is an active metabolite of vitamin ... inhibition DB00136 CDH17~Q12864 Calcitriol STITCH ACTS_ON 0.957
5 A complex of related glycopeptide antibiotics ... inhibition DB00290 LIG3~P49916 Bleomycin STITCH ACTS_ON 0.958
6 Tolbutamide is an oral antihyperglycemic agent... inhibition DB01124 GCG~P01275 Tolbutamide STITCH ACTS_ON 0.957
7 Vildagliptin, previously identified as LAF237,... inhibition DB04876 GCG~P01275 Vildagliptin STITCH ACTS_ON 0.957
8 Glyburide is a second generation sulfonylurea ... inhibition DB01016 GCG~P01275 Glyburide STITCH ACTS_ON 0.957
9 None inhibition DB04297 HDAC1~Q13547 Trichostatin A STITCH ACTS_ON 0.938
10 Vorinostat (rINN) or suberoylanilide hydroxami... inhibition DB02546 HDAC1~Q13547 Vorinostat STITCH ACTS_ON 0.987
11 Gemcitabine is a nucleoside analog used as che... inhibition DB00441 CMPK1~P30085 Gemcitabine STITCH ACTS_ON 0.991
12 Resveratrol (3,5,4'-trihydroxystilbene) is a p... inhibition DB02709 PTGES~O14684 Resveratrol STITCH ACTS_ON 0.957
13 None inhibition DB03222 TXN~P10599 dATP STITCH ACTS_ON 0.900
14 Atorvastatin (Lipitor®), is a lipid-lowering d... inhibition DB01076 THBS1~P07996 Atorvastatin STITCH ACTS_ON 0.957
[262]:
treatment_list = ['"{}"'.format(t) for t in treatment_regimens['treatment'].tolist()]
proposed_list = ['"{}"'.format(t) for t in drugs_proposed["drug_id"].unique()]
q = selected_queries.loc['side_effects_jaccard_similarity_intervention_proposed_drugs', 'query'].replace("INTERVENTION",",".join(treatment_list)).replace("DRUG",",".join(proposed_list))
similarity = connector.getCursorData(driver=driver, query=q, parameters={})
[263]:
q
[263]:
'MATCH (d1:Drug)-[:HAS_SIDE_EFFECT]->(phenotype1) WHERE d1.name in ["Oxaliplatin","Capecitabine","Folinic acid","Fluorouracil"] WITH d1, collect(id(phenotype1)) as treatmentData MATCH (d2:Drug)-[:HAS_SIDE_EFFECT]->(phenotype2) WHERE d2.id IN ["DB00678","DB00675","DB01229","DB02659","DB00136","DB00290","DB01124","DB04876","DB01016","DB04297","DB02546","DB00441","DB02709","DB03222","DB01076"] AND d1 <> d2 WITH d1, treatmentData, d2, collect(id(phenotype2)) as proposedData, size(apoc.coll.intersection(treatmentData, collect(DISTINCT id(phenotype2)))) AS intersection RETURN d1.name AS from, d2.name AS to, intersection, gds.alpha.similarity.jaccard(proposedData, treatmentData) AS similarity ORDER BY similarity DESC\n'
[264]:
similarity = similarity[(similarity['from'].isin(treatment_regimens['treatment'].tolist())) & (similarity['to'].isin(side_effects['node1'].unique().tolist()))]
similarity = similarity.groupby('to').mean().sort_values(by='similarity')
[265]:
similarity
[265]:
intersection similarity
to
Cholic Acid 5.666667 0.041012
Tolbutamide 11.000000 0.083821
Vildagliptin 15.000000 0.098243
Calcitriol 25.333333 0.156174
Vorinostat 26.000000 0.162747
Bleomycin 28.666667 0.192705
Glyburide 36.000000 0.219190
Tamoxifen 44.000000 0.223948
Gemcitabine 47.000000 0.256839
Atorvastatin 61.000000 0.277987
Losartan 61.000000 0.310073
Paclitaxel 97.000000 0.378702
[266]:
similarity = similarity[similarity['similarity']<0.2]
[267]:
similarity
[267]:
intersection similarity
to
Cholic Acid 5.666667 0.041012
Tolbutamide 11.000000 0.083821
Vildagliptin 15.000000 0.098243
Calcitriol 25.333333 0.156174
Vorinostat 26.000000 0.162747
Bleomycin 28.666667 0.192705

6) Proposed Drug Candidates:


Vorinostat

Trichostatin A (https://www.drugbank.ca/drugs/DB04297)

7-[4-(Dimethylamino)Phenyl]-N-Hydroxy-4,6-Dimethyl-7-Oxo-2,4-Heptadienamide"

[281]:
import pandas as pd
drug_result = drugs_proposed[drugs_proposed['node2'].isin(['Trichostatin A','Vorinostat'])]
drug_result.columns = ["Drug_desc","Action","Drug_id","Protein","Drug_name","Drug_protein_source","Drug_interaction_type","Inhibition_score"]
drug_result['Side_effects'] = ";".join(side_effects[side_effects['node1']==drug_result["Drug_name"].tolist().pop()]['node2'].tolist())
drug_result['Publications'] = ";".join(drugs_publications[drugs_publications['node1']==drug_result['Drug_name'].tolist().pop()]['node2'].tolist())
protein_result = proteins_associated_lung_cancer[proteins_associated_lung_cancer['node1']==drug_result["Protein"].tolist().pop()]
protein_result.columns = ["Protein", "Disease", "Protein_disease_source", "Protein_disease_association_type", "Disease_score"]
result = pd.merge(drug_result, protein_result, on='Protein')
result = result[["Protein", "Disease",
                 "Protein_disease_source", "Protein_disease_association_type",
                 "Disease_score", "Drug_name", "Drug_id", "Drug_desc", "Action", "Drug_protein_source","Drug_interaction_type","Inhibition_score",
                 "Side_effects", "Publications"
                ]]
[282]:
result
[282]:
Protein Disease Protein_disease_source Protein_disease_association_type Disease_score Drug_name Drug_id Drug_desc Action Drug_protein_source Drug_interaction_type Inhibition_score Side_effects Publications
0 HDAC1~Q13547 lung cancer DISEASES ASSOCIATED_WITH 2.51 Trichostatin A DB04297 None inhibition STITCH ACTS_ON 0.938 Alopecia of scalp;Erythroderma;Syncope;Diarrhe... PMID:32904337;PMID:31772153;PMID:27743148;PMID...
1 HDAC1~Q13547 lung cancer DISEASES ASSOCIATED_WITH 2.51 Vorinostat DB02546 Vorinostat (rINN) or suberoylanilide hydroxami... inhibition STITCH ACTS_ON 0.987 Alopecia of scalp;Erythroderma;Syncope;Diarrhe... PMID:32904337;PMID:31772153;PMID:27743148;PMID...
[283]:
regulation_result = regulation_table[regulation_table["identifier"]==drug_result["Protein"].tolist().pop()]
regulation_result
[283]:
-log10 pvalue FC Method T-statistics correction dfk dfn effsize efftype group1 ... identifier log2FC mean(group1) mean(group2) padj pvalue rejected s0 std(group1) std(group2)
1885 0.782856 10.367528 SAMR Two class paired -1.008813 permutation FDR (4.0 perm) 1 2 1.227 hedges CANCER ... HDAC1~Q13547 3.374 30.666 27.292 0.036735 0.164871 True 2 0.301 2.201

1 rows × 21 columns